Crate email_address
source ·Expand description
A Rust crate providing an implementation of an RFC-compliant EmailAddress
newtype.
Primarily for validation, the EmailAddress
type is constructed with FromStr::from_str
which will raise any
parsing errors. Prior to constructions the functions is_valid
, is_valid_local_part
, and is_valid_domain
may
also be used to test for validity without constructing an instance. This supports all of the RFC ASCII and UTF-8
character set rules, quoted and unquoted local parts but does not yet support all of the productions required for SMTP
headers; folding whitespace, comments, etc.
"Simon Johnston <johnstonsk@gmail.com>"
^------------------^ email()
^-------^ domain()
^--------^ local_part()
^------------^ display_part()
§Example
The following shoes the basic is_valid
and from_str
functions.
use email_address::*;
use std::str::FromStr;
assert!(EmailAddress::is_valid("user.name+tag+sorting@example.com"));
assert_eq!(
EmailAddress::from_str("Abc.example.com"),
Error::MissingSeparator.into()
);
The following shows the three format functions used to output an email address.
use email_address::*;
use std::str::FromStr;
let email = EmailAddress::from_str("johnstonsk@gmail.com").unwrap();
assert_eq!(
email.to_string(),
"johnstonsk@gmail.com".to_string()
);
assert_eq!(
String::from(email.clone()),
"johnstonsk@gmail.com".to_string()
);
assert_eq!(
email.as_ref(),
"johnstonsk@gmail.com"
);
assert_eq!(
email.to_uri(),
"mailto:johnstonsk@gmail.com".to_string()
);
assert_eq!(
email.to_display("Simon Johnston"),
"Simon Johnston <johnstonsk@gmail.com>".to_string()
);
§Specifications
- RFC 1123: Requirements for Internet Hosts – Application and Support, IETF,Oct 1989.
- RFC 3629: UTF-8, a transformation format of ISO 10646, IETF, Nov 2003.
- RFC 3696: Application Techniques for Checking and Transformation of Names, IETF, Feb 2004.
- RFC 4291 IP Version 6 Addressing Architecture, IETF, Feb 2006.
- RFC 5234: Augmented BNF for Syntax Specifications: ABNF, IETF, Jan 2008.
- RFC 5321: Simple Mail Transfer Protocol, IETF, Oct 2008.
- RFC 5322: Internet Message Format, I ETF, Oct 2008.
- RFC 5890: Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework, IETF, Aug 2010.
- RFC 6531: SMTP Extension for Internationalized Email, IETF, Feb 2012
- RFC 6532: Internationalized Email Headers, IETF, Feb 2012.
From RFC 5322: §3.2.1. Quoted characters:
quoted-pair = ("\" (VCHAR / WSP)) / obs-qp
From RFC 5322: §3.2.2. Folding White Space and Comments:
FWS = ([*WSP CRLF] 1*WSP) / obs-FWS
; Folding white space
ctext = %d33-39 / ; Printable US-ASCII
%d42-91 / ; characters not including
%d93-126 / ; "(", ")", or "\"
obs-ctext
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = (1*([FWS] comment) [FWS]) / FWS
From RFC 5322: §3.2.3. Atom:
atext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for atoms.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
dot-atom-text = 1*atext *("." 1*atext)
dot-atom = [CFWS] dot-atom-text [CFWS]
specials = "(" / ")" / ; Special characters that do
"<" / ">" / ; not appear in atext
"[" / "]" /
":" / ";" /
"@" / "\" /
"," / "." /
DQUOTE
From RFC 5322: §3.2.4. Quoted Strings:
qtext = %d33 / ; Printable US-ASCII
%d35-91 / ; characters not including
%d93-126 / ; "\" or the quote character
obs-qtext
qcontent = qtext / quoted-pair
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
From RFC 5322, §3.4.1. Addr-Spec Specification:
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 / ; characters not including
obs-dtext ; "[", "]", or "\"
RFC 3696, §3. Restrictions on email addresses describes in detail the quoting of characters in an address.
§Unicode
RFC 6531, §3.3. Extended Mailbox Address Syntax extends the rules above for non-ASCII character sets.
sub-domain =/ U-label
; extend the definition of sub-domain in RFC 5321, Section 4.1.2
atext =/ UTF8-non-ascii
; extend the implicit definition of atext in
; RFC 5321, Section 4.1.2, which ultimately points to
; the actual definition in RFC 5322, Section 3.2.3
qtextSMTP =/ UTF8-non-ascii
; extend the definition of qtextSMTP in RFC 5321, Section 4.1.2
esmtp-value =/ UTF8-non-ascii
; extend the definition of esmtp-value in RFC 5321, Section 4.1.2
A “U-label” is an IDNA-valid string of Unicode characters, in Normalization Form C (NFC) and including at least one non-ASCII character, expressed in a standard Unicode Encoding Form (such as UTF-8). It is also subject to the constraints about permitted characters that are specified in Section 4.2 of the Protocol document and the rules in the Sections 2 and 3 of the Tables document, the Bidi constraints in that document if it contains any character from scripts that are written right to left, and the symmetry constraint described immediately below. Conversions between U-labels and A-labels are performed according to the “Punycode” specification RFC3492, adding or removing the ACE prefix as needed.
RFC 6532: §3.1 UTF-8 Syntax and Normalization, and §3.2 Syntax Extensions to RFC 5322 extend the syntax above with:
UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4
...
VCHAR =/ UTF8-non-ascii
ctext =/ UTF8-non-ascii
atext =/ UTF8-non-ascii
qtext =/ UTF8-non-ascii
text =/ UTF8-non-ascii
; note that this upgrades the body to UTF-8
dtext =/ UTF8-non-ascii
These in turn refer to RFC 6529 §4. Syntax of UTF-8 Byte Sequences:
A UTF-8 string is a sequence of octets representing a sequence of UCS characters. An octet sequence is valid UTF-8 only if it matches the following syntax, which is derived from the rules for encoding UTF-8 and is expressed in the ABNF of [RFC2234].
UTF8-octets = *( UTF8-char )
UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
UTF8-1 = %x00-7F
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
Comments in addresses are discussed in RFC 5322 Appendix A.5. White Space, Comments, and Other Oddities.
An informal description can be found on Wikipedia.
Structs§
- Type representing a single email address. This is basically a wrapper around a String, the email address is parsed for correctness with
FromStr::from_str
, which is the only want to create an instance. The various components of the email are not parsed out to be accessible independently. - Struct of options that can be configured when parsing with
parse_with_options
.
Enums§
- Error type used when parsing an address.